Search CORE

52 research outputs found

Simulation-Based Performance Prediction for Large Parallel Machines

Author: Gengbin Zheng
Laxmikant V. Kalé
O.S. Lawlor
Praveen Jagadishprasad
Terry Wilmarth
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Kaapi / Charm++ preliminary comparison

Author: Besseron Xavier
Gautier Thierry
Kalé Laxmikant V.
Zheng Gengbin
Publication venue
Publication date: 01/06/2010
Field of study

Open Repository and Bibliography - Luxembourg

NAMD: biomolecular simulation on thousands of processors

Author: Gengbin Zheng
James C Phillips
Laxmikant V Kalé
Sameer Kumar
Publication venue
Publication date: 01/01/2002
Field of study

Abstract NAMD is a fully featured, production molecular dynamics program for high performance simulation of large biomolecular systems. We have previously, at SC2000, presented scaling results for simulations with cutoff electrostatics on up to 2048 processors of the ASCI Red machine, achieved with an object-based hybrid force and spatial decomposition scheme and an aggressive measurement-based predictive load balancing framework. We extend this work by demonstrating similar scaling on the much faster processors of the PSC Lemieux Alpha cluster, and for simulations employing efficient (order N log N) particle mesh Ewald full electrostatics. This unprecedented scalability in a biomolecular simulation code has been attained through latency tolerance, adaptation to multiprocessor nodes, and the direct use of the Quadrics Elan library in place of MPI by the Charm++/Converse parallel runtime system

CiteSeerX

Argobots: A Lightweight Low-Level Threading and Tasking Framework

Author: Amer Abdelhalim
Balaji Pavan
Beckman Pete
Bordage Cyril
Bosilca George
Brooks Alex
Carns Philip
Castelló Adrián
Genet Damien
Herault Thomas
Iwasaki Shintaro
Jindal Prateek
Kalé Laxmikant V.
Krishnamoorthy Sriram
Lifflander Jonathan
Lu Huiwei
Meneses Esteban
Seo Sangmin
Snir Marc
Sun Yanhua
Taura Kenjiro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

In the past few decades, a number of user-level threading and tasking models have been proposed in the literature to address the shortcomings of OS-level threads, primarily with respect to cost and flexibility. Current state-of-the-art user-level threading and tasking models, however, either are too specific to applications or architectures or are not as powerful or flexible. In this paper, we present Argobots, a lightweight, low-level threading and tasking framework that is designed as a portable and performant substrate for high-level programming models or runtime systems. Argobots offers a carefully designed execution model that balances generality of functionality with providing a rich set of controls to allow specialization by end users or high-level programming models. We describe the design, implementation, and performance characterization of Argobots and present integrations with three high-level models: OpenMP, MPI, and colocated I/O services. Evaluations show that (1) Argobots, while providing richer capabilities, is competitive with existing simpler generic threading runtimes; (2) our OpenMP runtime offers more efficient interoperability capabilities than production OpenMP runtimes do; (3) when MPI interoperates with Argobots instead of Pthreads, it enjoys reduced synchronization costs and better latency-hiding capabilities; and (4) I/O services with Argobots reduce interference with colocated applications while achieving performance competitive with that of a Pthreads approach

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

INRIA a CCSD electronic archive server

Repositori Institucional de la Universitat Jaume I

The reduce-or process model for parallel execution of logic programs

Author: Kalé Laxmikant V.
Publication venue: Published by Elsevier Inc.
Publication date: 31/07/1991
Field of study

AbstractA method for parallel execution of logic programs is presented. It uses REDUCE-OR trees instead of AND-OR or SLD trees. The REDUCE-OR trees represent logic-program computations in a manner suitable for parallel interpretation. The REDUCE-OR process model is derived from the tree representation by providing a process interpretation of tree development, and devising efficient bookkeeping mechanisms and algorithms. The process model is complete—it produces any particular solution eventually—and extracts full OR parallelism. This is in contrast to most other schemes that extract AND parallelism. It does this by solving the problem of interaction between AND and OR parallelism effectively. An important optimization that effectively controls the apparent overhead in the process model is given. Techniques that trade parallelism for reducing overhead are also described

Elsevier - Publisher Connector

Application Oriented And Computer Science Centered Hpcc Research

Author: Laxmikant V. Kalé
Publication venue
Publication date
Field of study

At this time, there is a perception of a backlash against the HPCC program, and even the idea of massively parallel computing itself. In preparation to defining an agenda for HPCC, this paper first analyzes the reasons for this backlash. Although beset with unrealistic expectations, parallel processing will be a beneficial technology with a broad impact, beyond applications in science. However, this will require significant advances and work in computer science in addition to parallel hardware and end-applications which are emphasized currently. The paper presents a possible agenda that could lead to a successful HPCC program in the future. 1 Introduction It is clear that amid the excitement about the emerging high performance computing technology, a backlash of sorts is developing. This backlash is against the HPCC program as well as the idea of massively parallel computing itself. Ken Kennedy, a leading researcher in parallel computing, wrote an article recently, titled "High Perfor..

CiteSeerX

Crossref

A fault tolerance protocol with fast fault recovery

Author: Laxmikant V. Kalé
Sayantan Chakravorty
Publication venue: IEEE Press
Publication date: 01/01/2007
Field of study

Fault tolerance is an important issue for large machines with tens or hundreds of thousands of processors. Checkpoint-based methods, currently used on most machines, rollback all processors to previous checkpoints after a crash. This wastes a significant amount of computation as all processors have to redo all the computation from that checkpoint onwards. In addition, recovery time is bound by the time between the last checkpoint and the crash. Protocols based on message logging avoid the problem of rolling back all processors to their earlier state. However, the recovery time of existing message logging protocols is no smaller than the time between the last checkpoint and crash. We present a fault tolerance protocol, in this paper, that provides fast restarts by using the ideas of message logging and object-based processor virtualization. We evaluate our implementation of the protocol in the Charm++/Adaptive MPI runtime system. We show that our protocol provides fast restarts and, for many applications, has low fault-free overhead.

CiteSeerX

Crossref